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[57] ABSTRACT 

A method and apparatus for hierarchical character recogni- 
tion processing of ambiguous and noisy characters which 
produces highly reliable results at high levels of hierarchical 
processing. The invention first applies a universal classifier 
system (which may comprise one or more universal 
classifiers) to input image data, and identifies "suspicious" 
characters. The image data for suspicious characters is then 
applied to a "specialist" classifier that is designed to handle 
only a narrow and well-defined set of recognition cases. This 
hierarchical processing architecture and method results in 
increased accuracy of recognition. The method is particu- 
larly applicable to handwritten characters and to distorted 
and noisy machine-printed characters. 

3 Claims, 2 Drawing Sheets 
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HIERARCHICAL CHARACTER 
RECOGNITION SYSTEM 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates to character recognition systems. 

2. Description of Related Art 

Character recognition, such as optical character recogni- 
tion (OCR), involves scanning of documents and automated 
recognition of machine-printed or handwritten characters. 
FIG. 1 is a block diagram of a typical prior art OCR system. 
Documents 1 are passed through a scanner 2 which gener- 
ates image data 3. The image data 3 is applied to a processor 
4 (such as a general purpose computer) suitably pro- 
grammed with character recognition computer programs 
(while most current OCR systems are software based, an 
equivalent system can be implemented completely in 
hardware). 

The processor 4 produces a set of characters (typically 
coded in ASCII) of some or all of the scanned document as 
output. The character recognition system has to locate fields 
of interest (which may be the whole document, as in the case 
of typed pages) in the scanned image data 3, extract indi- 
vidual characters from the fields of interest, recognize these 
characters, and produce codes for each of the recognized 
characters. 

Real-world images and characters frequently suffer from 
a number of degradations (such as worn out typewriter 
ribbons, missing pins in dot-matrix printers, skipping ball- 
point pens, poor quality handwriting, deficiencies in the 
scanning process, etc.). Accordingly, a character recognition 
system must be able to provide a degree of confidence in its 
results to be of practical use. This degree of confidence can 
relate to the recognized document or fields in the document, 
but certainly must be present for individual recognized 
characters. With ambiguous or noisy characters, an OCR 
system can assign several potential identities to the image 
data comprising a character. These identities are usually rank 
ordered by a confidence factor, so that the most probable 
identity of the character has the highest confidence, the next 
most probable identity has the next highest confidence, etc. 

Traditionally, OCR programs have been designed and 
utilized as single pass, single classifier systems, an example 
of which is shown in FIG. 2. Image data 3 is applied to a 
"universal" classifier 5 which outputs machine-readable 
data, typically in ASCII 6 form. A universal classifier is 
designed to recognize a large set of characters such as letters, 
numbers, or alphanumeric characters. A drawback of single 
pass, single classifier systems is that recognition frequently 
fails when the classifier is confronted with ambiguous 
characters (e.g., "I", "1", and "1") or "noisy" (i.e., poorly 
formed) characters. 

More recently, OCR systems have utilized multiple uni- 
versal classifiers in conjunction with a "voting" algorithm to 
select the output of one of the classifiers. FIG. 3 is a block 
diagram of a prior art multiple universal classifier system, in 
which image data 3 is applied to some or all of n universal 
classifiers 5, the outputs of which are coupled to a voting 
function 7. The universal classifiers 5 are trained for differ- 
ent characteristics or use different recognition algorithms. 
The voting function 7 may be any one of several algorithms 
which compare or combine the outputs of the universal 
classifiers 5 to arrive at a (presumably) more reliable char- 
acter recognition. The voting function 7 then outputs a 
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character code 6. While multiple universal classifier systems 
give improved recognition compared to single pass, single 
classifier systems, such systems would benefit from further 
improvements. 

s Accordingly, the inventor has recognized a need for a 
better character recognition system. The present invention 
meets this need. 

SUMMARY OF THE INVENTION 

10 The invention comprises a method and means for hierar- 
chical character recognition processing of ambiguous and 
noisy characters which produces highly reliable results at 
high levels of hierarchical processing. The invention first 

15 applies a universal classifier system (which may comprise 
one or more universal classifiers) to input image data, and 
identifies "suspicious" characters. A suspicious character 
may be determined based upon any desired criteria, such as 
apparent size, the type of character, level of gray in the 

20 image of the character, styles of handwriting, prior knowl- 
edge that a character candidate has been "surgically" sepa- 
rated from an adjoining character, or upon assignment of a 
character candidate by the universal classifier system to a 

25 predefined character groups known to be ambiguous (e.g., 
the pair "4" and "9"; the group 'T\ "1" "1", etc.). The image 
data for suspicious characters is then applied to a "special- 
ist" classifier that is designed to handle only a narrow and 
well-defined set of recognition cases. These specialist clas- 

30 sifiers normally have only a few outputs (for example, 2 or 
3 possible characters). This hierarchical processing archi- 
tecture and method results in increased accuracy of recog- 
nition. The method is particularly applicable to handwritten 

35 characters and to distorted and noisy machine-printed char- 
acters. 

The details of the preferred embodiment of the invention 
are set forth in the accompanying drawings and the descrip- 
tion below. Once the details of the invention are known, 
40 numerous additional innovations and changes will become 
obvious to one skilled in the art. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a typical prior art OCR 
system. 

FIG. 2 is a block diagram showing a prior art single pass, 
single classifier system. 

FIG. 3 is a block diagram of a prior art multiple universal 
50 classifier system. 

FIG. 4 is a block diagram showing the basic architecture 
of the invention. 

FIG. 5 is a flow chart showing one embodiment of the 
invention. 

55 Like reference numbers and designations in the various 
drawings indicate like elements. 

DETAILED DESCRIPTION OF THE 
INVENTION 

60 

Throughout this description, the preferred embodiment 
and examples shown should be considered as exemplars, 
rather than as limitations on the invention. 

Overview 

65 

The shapes of modern characters have evolved over 
millennia into a number of classes of morphologically 
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similar characters. Unfortunately, similarity exists even determination is then made as to whether the character is 

between characters of different classes sufficient to create "suspicious". For example, the character may have been 

ambiguities in the recognition of such classes. For example, recognized by a universal classifier system 8 as probably 

"IVT', and "1" have rather similar shapes to a recognition being a « 4 » Tte universal classifier system 8 can be 

system, as do "S" and "5", "4" and "9", "3" and y , and s pre _ pr0 g ramme d, for example, to always recognize that a 

several other "ambiguity" classes. The correct identification ^ fe & icious character 5ecause ft ^ ofteD mislaken for 

of such ambiguous characters requires extensive recognition f , 

& f J , , t, , * . r a "9" (particularly with handwritten characters). Thus, the 

capabilities or the presence of context. The larger the set of . , . r , . • • . 

characters, the more ambiguity classes it possesses. A uni- ^ dldat * ^ aracter 15 P art of ^ ambiguity class containing 

versal classifier that is designed to recognize a full set of 10 "4" and "9". 

characters (such as all alphabetic characters and/or all If the character is not suspicious, then the probable 
numeric characters) is regularly overwhelmed by morpho- character determined by the universal classifier system 8 is 
logically similar characters that humans normally assign to output as a code 6 (step 104). If the character is suspicious 
different classes. ^ sle p a specialist classifier 9 for the suspicious char- 
To overcome this problem with universal classifiers, the is ac(cr fc sdected ( stcp 106 y ^ specialist classifier 
invention employs a hierarchy of "specialist" classifiers, 9 is ^ appue d to that image data 3 (step 108) and 
each configured to recognize characters belonging to distinct determines the most probable character to be assigned to the 
ambiguity classes. That is, each special classifier is trained dala 3 nM characler i s me n output as a code 6 (step 
or built using known principles to distinguish only the 2Q n y Nole tha| ^ cha racter determined by the selected 
differences between characters in an ambiguity class (e.g., specialist classifier may be the same character determined as 
"4" and "9"); the special classifiers are not designed to beiflg most probable by fa e universal classifier system 8. 
process characters of other shapes. A classifier of this kind 

is trained (built) only on a large set of characters that belong Implementation 

to a specific ambiguity class. 25 ^ mvention may be ^^^4 in hardware (digital, 

The specialist classifiers may be implemented in any analogj or hy5rid aig^anaiog) or software, or a combi- 

desired fashion, using, for example, feature extraction algo- natkm of bmh Howeverj pre ferably, the invention is implc- 

rithms such as neural networks and syntactic or linguistic mentcd m computer programs executing on programmable 

algorithms, "nearest neighbor" algorithms, and other algo- 3Q computers each comprising at least one processor, a data 

rilhms known in the art of character recognition. The dis- stor age system (including volatile and non-volatile memory 

tinctive feature of the invention is that the full power of such and/of stQrage elements ) j al leasl one input device, and at 

methods is brought to bear on a specific ambiguity class leagt Qne outpm device Prograra code ^ applied to input 

(e.g., "3" and "8", etc.). data t0 pcr f orm the functions described herein and generate 

EXAMPLE EMBODIMENT 35 output information. The output information is applied to one 

a , . ... .... or more output devices, in known fashion. 

FIG. 4 is a block diagram showing the basic architecture r t 

of the invention. Image data 3 is applied to a universal Each 15 Preferably implemented in a high level 

classifier system 8, which can comprise one or more uni- procedural or object oriented programming language to 

versal classifiers of the types known in the prior art. For 40 communicate with a computer system. However, the pro- 

unambiguous data, the universal classifier system 8 outputs S rams caD be implemented in assembly or machine 

a character code 6 language, if desired. In any case, the language may be a 

However, added to the universal classifier system 8 is the com P iled or interpreted language, 
ability to "call" a specialist classifier 9 trained or built to 45 Each such computer program is preferably stored on a 
recognize ambiguous image dala supplied from the universal storage media or device (e.g., ROM or magnetic diskette) 
classifier system 8. Any particular specialist classifier 9 is readable by a general or special purpose programmable 
selected based upon the probable identity of a candidate computer, for configuring and operating the computer when 
character, as determined by the universal classifier system 8, *e storage media or device is read by the computer to 
and whether the candidate character is "suspicious". A 50 perform the procedures described herein. The inventive 
suspicious character may be determined based upon any system may also be considered to be implemented as a 
desired criteria, such as apparent size, the type of character, computer-readable storage medium, configured with a corn- 
level of gray in the image of the character, styles of P uter program, where the storage medium so configured 
handwriting, prior knowledge that a character candidate has 5S causes a computer to operate in a specific and predefined 
been "surgically" separated from an adjoining character, or manner to perform the functions described herein, 
upon assignment of a character candidate by the universal In one implementation of the invention, use of specialist 
classifier system to a predefined character groups known to classifiers improved the error-rate by about 40% compared 
be ambiguous (e.g., the pair "4" and "9"; the group "I", "1", to the same system without specialist classifiers. 
"1", etc.). The "called" specialist classifier 9 analyzes the 60 A number of embodiments of the invention have been 
image data by performing a recognition algorithm tailored to described. Nevertheless, it will be understood that various 
the candidate character and then outputs a probable charac- modifications may be made without departing from the spirit 
ter code 6. and scope of the invention. Accordingly, it is to be under- 

FIG. 5 is a flow chart showing one embodiment of the 65 stood that the invention is not to be limited by the specific 

invention. One or more universal classifiers 8 are applied to illustrated embodiment, but only by the scope of the 

image data 3 to generate a probable character (step 100). A appended claims. 
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What is claimed is: 

1. A hierarchical recognition system including: 

(a) a universal classifier system for recognizing a probable 
character from input image data, for determining if the 
probable character is a suspicious character, and for 5 
outputting a character code for each non -suspicious 
character; 

(b) at least one specialist classifier, each for recognizing 
a distinct ambiguity class of characters, and selectably 
coupled to the universal classifier system; 

(c) means for selecting and applying a specialist classifier 
corresponding to each suspicious character, whereby 
the selected specialist classifier outputs a character 
code for each suspicious character. 15 

2. A method for recognizing characters from input image 
data, comprising the steps of: 

(a) recognizing a probable character from the input image 
data; 

(b) determining if the probable character is a suspicious 20 
character; 

(c) outputting a character code for each non-suspicious 
character, and otherwise selecting a specialist classifier, 
configured to recognize a distinct ambiguity class of 
characters, corresponding to each suspicious character; 



(d) recognizing each suspicious character with the 
selected specialist classifier; and 

(e) outputting a character code for each suspicious char- 
acter. 

3. A computer program, residing on a computer-readable 
medium, for recognizing characters from input image data, 
the computer program comprising instructions for causing a 
computer to: 

(a) recognize a probable character from the input image 
data; 

(b) determine if the probable character is a suspicious 
character; 

(c) output a character code for each non-suspicious 
character, and otherwise select a specialist classifier, 
configured to recognize a distinct ambiguity class of 
characters, corresponding to each suspicious character; 

(d) recognize each suspicious character with the selected 
specialist classifier; and 

(e) output a character code for each suspicious character. 
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